feat: add `docker agent serve chat` command (OpenAI-compatible API) by dgageot · Pull Request #2510 · docker/docker-agent

dgageot · 2026-04-25T18:00:41Z

Exposes any docker-agent agent through an OpenAI-compatible HTTP server, so any tool that already speaks the Chat Completions protocol (Open WebUI, the official openai SDKs, ad-hoc curl scripts, etc.) can drive an agent without a custom integration.

Endpoints

Method	Path	Notes
`GET`	`/v1/models`	Lists exposed agents as OpenAI models
`POST`	`/v1/chat/completions`	Runs the agent; supports `stream: true` (SSE) and `false`

Usage

docker agent serve chat ./agent.yaml                       # localhost:8083
docker agent serve chat ./team.yaml --agent reviewer       # pin one agent
docker agent serve chat agentcatalog/pirate --listen :9090

curl -sS -X POST http://127.0.0.1:8083/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"hi"}]}'

Design

The team is loaded once at startup and shared across requests. Each chat completion gets a fresh session and runtime.
The session is created with ToolsApproved=true and NonInteractive=true — there is no human in the loop. ElicitationRequestEvent is still explicitly declined to avoid hanging on the runtime's elicitation channel.
The model field of the request can pin a specific agent in a multi-agent team. If it doesn't match an exposed agent (e.g. clients that hard-code gpt-4) we silently fall back to the default agent and echo the requested model name back, so clients matching on the model field stay happy.
Streaming uses SSE in OpenAI's chat.completion.chunk format and ends with data: [DONE].

Implementation

New cobra command cmd/root/chat.go (default 127.0.0.1:8083, --agent / --listen flags) wired into cmd/root/serve.go.
New pkg/chatserver package, split across:
- server.go — Run, router, HTTP handlers, sseStream, error envelope
- agent.go — agentPolicy, buildSession, runAgentLoop, sessionUsage
- types.go — request/response shapes
Reuses openai.Model from github.com/openai/openai-go/v3 for /v1/models. Other SDK response types serialise too noisily with stdlib encoding/json (the SDK relies on its internal apijson package, which lives under internal/), so the chat-completion shapes are hand-rolled for clean output.

Tests

Unit tests for session-building, agent-policy resolution, usage extraction.
HTTP-level tests via httptest for /v1/models shape, the three early-validation paths of /v1/chat/completions (bad JSON, empty messages, history without user), and writeError's status→type mapping.

Validation

mise lint — 0 issues
mise test — all packages green
Manual curl smoke test against examples/42.yaml: /v1/models returns the agent, error paths return correct OpenAI-shaped envelopes.

Demonstrates the OpenAI-compatible HTTP server introduced in PR docker#2510. Uses the official github.com/openai/openai-go SDK pointed at the local chat server's /v1 base URL and runs an interactive REPL with streaming, history retention, and graceful Ctrl-C shutdown. Run `docker agent serve chat ./agent.yaml` in one terminal, then `go run ./examples/chat` in another. Assisted-By: docker-agent

dgageot · 2026-04-27T12:41:37Z

Update — expanded scope

I pushed a force-update onto this branch with 18 additional commits on top of the original feat: add docker agent serve chat command. Many of them implement the "easy wins" from the original PR's "Limitations / future work" section, plus a working Go example, plus a couple of bug fixes uncovered while reviewing the new code.

New commits (oldest → newest)

Example

examples: add minimal chat client for docker agent serve chat — examples/chat/main.go runs an interactive REPL against the server using the official github.com/openai/openai-go SDK, demonstrating the OpenAI compatibility end-to-end.

Hardening (trivial / opt-in, safe defaults)

chatserver: replace * CORS with --cors-origin flag — wildcard removed; CORS off by default.
chatserver: enforce max body size and per-request timeout — --max-request-size (1 MiB default) and --request-timeout (5 min default).
chatserver: collect every runtime ErrorEvent (errors.Join) — no more swallowed errors after the first.
chatserver: emit structured error events on streaming failures — proper finish_reason: "error" + error envelope, instead of [error: …] jammed into delta.content.
chatserver: parse and validate OpenAI sampling parameters — temperature, top_p, max_tokens, stop (string-or-array union) are now declared, range-checked and rejected with 400 on bad input.

Auth & deployment

chatserver: add Bearer-token auth (--api-key) — opt-in static bearer token (constant-time compare; OPTIONS preflight + /openapi.json exempted).
chatserver: support comma-list and regex in --cors-origin — allow-list, ~regex patterns, scheme/path validation.

Performance

chatserver: support X-Conversation-Id for stateful sessions — opt-in LRU + TTL cache so clients don't have to resend the full history every turn (--conversations-max, --conversation-ttl).
chatserver: pool runtimes per agent for warm reuse — small pool of warm runtimes per agent to avoid the per-request cost of runtime.New (--max-idle-runtimes).

Protocol surface

chatserver: surface agent tool calls as OpenAI tool_calls — agent-invoked tools are emitted in OpenAI's tool_calls shape on both streaming and non-streaming responses (informational; tools still execute server-side).
chatserver: serve /openapi.json for schema introspection — embedded OpenAPI 3.1 document; bypasses bearer auth.
chatserver: accept OpenAI multimodal content (text + image_url) — content accepts the union of string and typed-parts arrays; image parts now reach the runtime via chat.MultiContent.

Bug fixes (found by review pass)

fix(chatserver): always store conversation after request — maybeStoreConversation used to skip Put for existing conversations, so if the entry was evicted by another request mid-flight the updated session was lost.
refactor(chatserver): remove unused isNew parameter (follow-up).
test(chatserver): add test for conversation restore after eviction (follow-up).
chatserver: restore doc comment on chatCompletion — fixes a doc-comment regression introduced by the eviction fix.
chatserver: serialize requests sharing an X-Conversation-Id — concurrent requests sharing a conversation id used to share the same *session.Session and race on it. We now reject the second concurrent request with 409 Conflict, surfacing the misuse instead of producing a garbled transcript. Race-detector clean.

CI

go build ./... — clean.
go test -race ./pkg/chatserver/... ./cmd/root/... — clean.
golangci-lint run ./pkg/chatserver/... ./cmd/root/... ./examples/chat/... — 0 issues.

Breaking changes

None on the wire (the API only gains new optional features). For programmatic Go callers of chatserver.Run, the signature has changed from
Run(ctx, agentFilename, agentName, runConfig, ln) to Run(ctx, agentFilename, opts Options, ln); nothing outside this PR uses it.

Happy to split this into separate PRs if reviewers prefer; commit messages are written to be cherry-pickable.

Assisted-By: docker-agent

Expose any docker-agent agent through an OpenAI-compatible HTTP server, so tools that already speak the Chat Completions protocol (Open WebUI, the official `openai` SDKs, ad-hoc curl scripts, etc.) can drive an agent without any custom integration. Endpoints: GET /v1/models — lists exposed agents as OpenAI models POST /v1/chat/completions — runs the agent; supports stream: true (Server-Sent Events) and false The team is loaded once at startup and shared across requests; each chat completion gets a fresh session and runtime. Tool calls and elicitation prompts are auto-handled (this is a non-interactive endpoint). The `model` field can pin a specific agent in a multi- agent team, or is ignored and the team's default agent runs. Implementation notes: - New `cmd/root/chat.go` cobra command (default 127.0.0.1:8083, --agent / --listen flags) wired into `cmd/root/serve.go`. - New `pkg/chatserver` package, split into: - server.go — Run, router, HTTP handlers, sseStream, errors - agent.go — agentPolicy, buildSession, runAgentLoop, sessionUsage - types.go — request/response shapes - Reuses `openai.Model` from github.com/openai/openai-go/v3 for /v1/models. Other OpenAI SDK response types serialise too noisily with stdlib `encoding/json` (the SDK relies on its internal `apijson` package which we can't import), so request/response shapes are hand-rolled for clean output. - Defensive event handling in runAgentLoop: ToolsApproved=true and NonInteractive=true mean the runtime never blocks for confirmation in normal flow, but ElicitationRequestEvent must still be answered or the runtime would hang on its dedicated channel. Tests cover session-building, agent-policy, error-envelope shape, and the three early-validation paths of /v1/chat/completions via httptest. Validated with `mise lint` (0 issues), `mise test` (all packages green), and a curl smoke test against examples/42.yaml. Fixes docker#2502 Assisted-By: docker-agent

Demonstrates the OpenAI-compatible HTTP server introduced in PR docker#2510. Uses the official github.com/openai/openai-go SDK pointed at the local chat server's /v1 base URL and runs an interactive REPL with streaming, history retention, and graceful Ctrl-C shutdown. Run `docker agent serve chat ./agent.yaml` in one terminal, then `go run ./examples/chat` in another. Assisted-By: docker-agent

The chat server used to set `Access-Control-Allow-Origin: *` on every response, which makes it unsafe to expose on anything other than loopback. Replace the wildcard with an explicit per-server allow-list of one origin and disable the CORS middleware entirely when the flag is empty. - Introduce `chatserver.Options` so future improvements can extend the server configuration without breaking the `Run` signature on each change. - Add `--cors-origin` flag to `docker agent serve chat`. Default empty = no CORS headers emitted. - Update tests; fix three pre-existing `noctx` lint failures in handlers_test.go that surfaced when the PR was rebased onto current main. Assisted-By: docker-agent

Hostile or buggy clients could previously stream gigabytes into the chat completions endpoint or hold a goroutine open indefinitely on a slow upstream model. Cap both via Echo middleware: - `BodyLimit` defaults to 1 MiB (configurable via `--max-request-size`). Oversized bodies now return 413 instead of being silently buffered. - A new `requestTimeoutMiddleware` wraps `c.Request().Context()` in `context.WithTimeout` so model + tool calls + SSE streaming all share a single deadline. Default 5 minutes, configurable via `--request-timeout`. Both limits are exposed on `chatserver.Options` (`MaxRequestBytes`, `RequestTimeout`); zero values fall back to package defaults. Tests cover oversized body rejection and deadline propagation through the middleware chain. Assisted-By: docker-agent

Previously runAgentLoop would record only the first ErrorEvent and drop every subsequent one on the floor while still draining the stream. That made debugging a multi-error run frustrating: only the earliest symptom was ever surfaced, even though later events often held the actual root cause (a model timeout followed by a tool call that couldn't connect, for instance). Switch to a slice of errors and join them with `errors.Join` at the end. The handler's behaviour for callers is unchanged when a single error occurs; multi-error runs now surface a wrapped error whose `Unwrap() []error` makes each cause inspectable. Assisted-By: docker-agent

Until now a runtime error mid-stream was injected into the assistant content as `[error: ...]` and the stream still closed with `finish_reason: "stop"`. Clients matching on the OpenAI protocol had no programmatic way to tell a successful completion apart from a failed one. Switch to OpenAI's actual on-the-wire shape: emit a separate `data: {"error": {...}}` envelope, then terminate the stream with `finish_reason: "error"` before the `[DONE]` sentinel. Successful runs continue to terminate with `finish_reason: "stop"`. Add a unit test on the new `sseStream.sendError` covering the wire format. Assisted-By: docker-agent

OpenAI clients regularly send `temperature`, `top_p`, `max_tokens`, and `stop` on every chat completion request. The server used to drop them silently because the request struct didn't declare them, so typos and out-of-range values went unnoticed until the upstream provider eventually returned an opaque error several seconds later. - Add `Temperature`, `TopP`, `MaxTokens`, `Stop` to `ChatCompletionRequest` so the OpenAPI schema matches what the wire protocol allows. - `Stop` is JSON-flexible: clients send either a single string or an array, and OpenAI accepts both. Custom `UnmarshalJSON` handles the union shape. - `validateSamplingParams` range-checks the new fields and rejects bad input with a 400 invalid_request_error, matching how OpenAI itself behaves. Plumbing these values through the runtime to the model layer requires per-request overrides that don't exist today; that work is tracked separately. Validating up front is the user-visible win and unblocks future plumbing. Assisted-By: docker-agent

The chat server is unauthenticated by default, which is fine on loopback but unsafe anywhere else. Add an opt-in static bearer-token gate so the server can be safely bound to a LAN interface. - `chatserver.Options.APIKey`: when non-empty, every request to /v1/* must carry `Authorization: Bearer <token>` or it is rejected with 401. Empty preserves the previous unauthenticated behaviour. - `bearerAuthMiddleware` uses `subtle.ConstantTimeCompare` to dodge timing-side-channel leaks. CORS preflight (OPTIONS) is exempted so browsers can negotiate before sending the auth header. - `--api-key` and `--api-key-env` flags expose the option from the CLI; the env-var form keeps secrets out of process listings. Tests cover missing/wrong/correct tokens and the OPTIONS exemption. Assisted-By: docker-agent

Until now the server was strictly stateless: every chat completion request rebuilt a fresh session from the messages array, so clients paid the tokenization cost of replaying the full history on every turn. That works but is wasteful for long conversations. Add an opt-in conversation cache: - `chatserver.Options.ConversationsMaxSessions` enables an in-memory LRU keyed by the `X-Conversation-Id` request header. `Options.ConversationTTL` (default 30 min) bounds idle lifetime; expired entries are evicted lazily on access and on Put. - When a request carries a known id, the server reuses the existing session and only appends the latest user message from the request body. The session already has the prior turns. When the id is unknown (or the header is absent), the server falls back to the previous behaviour and builds a session from scratch. - New `--conversations-max` and `--conversation-ttl` CLI flags expose the feature. Default 0 keeps the old stateless behaviour. The cache implementation is a simple map + mutex with O(n) LRU scan; that's appropriate for the small caches typical for this feature, and avoids pulling in a new dependency. Tests cover Put/Get, TTL expiry, LRU eviction, Delete, and the new appendLatestUser helper. Assisted-By: docker-agent

Every chat completion request used to call `runtime.New` from scratch: that resolves the agent's tools, builds per-agent hook executors, and allocates per-runtime resume/elicitation channels. On a busy server those allocations show up in profiles. Add an opt-in pool so a small number of warm runtimes per agent can be reused across requests: - `chatserver.Options.MaxIdleRuntimes` (default 4 via `--max-idle- runtimes`) bounds the idle pool size per agent. 0 disables pooling entirely and restores the original "fresh runtime per request" behaviour. - `runtimePool.Get` returns a recycled runtime when one is idle, or creates a new one. `Put` returns it to the pool on completion; overflow is dropped on the floor (the team owns the toolsets, so nothing leaks). - A runtime is *not* safe for concurrent `RunStream` calls (its resume/elicitation channels are per-runtime), so the pool hands out at most one borrow per runtime at a time. Concurrency comes from holding multiple runtimes per agent. Assisted-By: docker-agent

The previous commit only accepted a single literal origin. Real deployments often need to allow several front-ends or all subdomains of a known SaaS. Extend the flag's grammar: - comma-separated entries form an explicit allow-list, each matched exactly; - entries prefixed with `~` are compiled as Go regex and matched against the request's `Origin` header at request time; - the literal `*` wildcard is preserved for the (rare) cases where the operator really wants it; - literal entries are validated up front: scheme must be http/https, no path/query/fragment, no missing host. Mistakes are caught at startup rather than producing silent allow-none behaviour at runtime. When the spec parses cleanly to nothing usable, the middleware is left unregistered and a slog.Error documents the misconfiguration. Tests cover the parser's accept/reject set and exercise allow-list + regex routing through the real Echo middleware. Assisted-By: docker-agent

When the agent invokes a tool, clients had no way to see what happened: tools ran inside the runtime, the assistant's eventual text output sometimes referenced them but often didn't, and the streaming protocol carried only the model's plain content. That's fine for a black-box transcript but useless for a chat UI that wants to render "🔧 calling search(query=…)" badges. Use OpenAI's standard `tool_calls` shape on both response styles: - Add `ToolCallReference` (mirrors OpenAI's tool_call entry) with `index`, `id`, `type`, `function.{name,arguments}`. - `ChatCompletionMessage.ToolCalls` populated on the non-streaming response so the assistant message lists every tool the agent invoked. - `ChatCompletionStreamDelta.ToolCalls` carries one tool per delta in streaming mode. The runtime hands us complete arguments, so one chunk per call is sufficient (vs. OpenAI's incremental argument streaming, which clients accumulate either way). - `runAgentLoop` now takes an `agentEmit` struct with `onContent` and `onToolCall` hooks instead of a single content callback. Both handlers fill in their respective hooks; missing ones are no-ops. Tools still execute server-side; this commit is purely about client observability. Surfacing results back through the protocol (so clients could intercept / replay them) is left for a future change. Assisted-By: docker-agent

Add a static OpenAPI 3.1 document describing /v1/models, /v1/chat/completions, the new tool_calls fields, the X-Conversation-Id header, and the bearer-auth security scheme. - The spec is hand-written and embedded with `//go:embed`. That keeps it easy to review (it's plain JSON, not generated noise), trivial to update when the API changes, and free of generation steps in the build. - A new `GET /openapi.json` route serves the spec verbatim. - `bearerAuthMiddleware` exempts /openapi.json so introspection tooling can discover the API even on locked-down deployments — there's no secret in the spec, only the shape of the API. Tests cover both the document shape (correct paths advertised) and the auth bypass. Assisted-By: docker-agent

OpenAI's chat protocol lets the `content` field of a message be either a string or an array of typed parts: "content": [ {"type": "text", "text": "What is in this picture?"}, {"type": "image_url", "image_url": {"url": "..."}} ] The chat server used to drop the parts variant on the floor: the field was typed as `string`, so multi-part requests deserialised to an empty content and the request was rejected as having "no user message". That made the server unable to serve any vision-capable agent. - Replace the plain `Content string` with a JSON-union (un)marshaller. `Content` still carries a flat-text view for string-form content and for the concatenated text of parts; a new `Parts []ContentPart` field holds the typed entries when the array shape is used. Existing Go callers (and every test that still writes `Content: "..."`) keep working unchanged. - `convertParts` translates the wire shape to the runtime's `chat.MessagePart` union (text + image_url), so the model provider sees the actual image. Unknown part types are dropped gracefully so future part kinds degrade rather than 500. - `appendLatestUser` (used by X-Conversation-Id continuation) gets the same multi-part path. - The OpenAPI spec advertises the union shape and the new ContentPart schema. Tests cover string/array round-trips, image_url plumbing into the session, and (still passing) all the pre-existing behaviour. Assisted-By: docker-agent

When a conversation is evicted from the LRU cache while a request is processing it, the updated session was not being stored back because maybeStoreConversation only called Put when isNew=true. This caused conversation state to be lost when: 1. Request R1 retrieves conversation C from cache (isNew=false) 2. R1 processes the request, updating the session 3. Meanwhile, C is evicted due to LRU policy 4. R1 finishes and calls maybeStoreConversation(C, sess, false) 5. Since isNew=false, Put was not called 6. The updated session is lost Fix: Always call Put, regardless of isNew flag. This ensures the updated session is stored and refreshes the lastUsed timestamp, preventing premature eviction of active conversations. The Put operation is idempotent and safe to call multiple times for the same conversation ID. Assisted-By: docker-agent Assisted-By: docker-agent

The isNew flag was used to decide whether to call Put on the conversation store, but after the previous fix, we always call Put regardless of whether the conversation is new or existing. This commit removes the now-unused isNew parameter from resolveSession and maybeStoreConversation, simplifying the code. Assisted-By: docker-agent

Add a test that verifies a conversation evicted from the LRU cache while a request is processing it can still be stored back after the request completes. This test validates the fix in commit 9563a43 which ensures maybeStoreConversation always calls Put, preventing loss of session state when a conversation is evicted during request processing. Assisted-By: docker-agent

The previous fix accidentally deleted the doc-comment header line on `(*server).chatCompletion`, leaving a dangling fragment ("// non-streaming OpenAI ChatCompletion object.") detached from the function it documents. Assisted-By: docker-agent

Concurrent requests with the same X-Conversation-Id share the same `*session.Session` pointer (the conversation cache hands out the same instance to every caller), so two simultaneous runtime RunStream calls would interleave message appends, send overlapping prompts to the model, and produce a garbled transcript. Although `session.Session` has internal mutex protection on Messages, the agent loop reads-then-writes (decide what to send, append model output) so per-field synchronisation isn't enough — the whole turn must be atomic with respect to other turns on the same id. Reject the second concurrent request with 409 Conflict instead of trying to serialise it on the server. That: - Surfaces the misuse to the caller immediately (vs. mysterious interleaving), - Keeps server-side resources bounded (no queue, no parked goroutines), - Matches how OpenAI's own conversation API expects clients to use the protocol (one request at a time per conversation). Empty conversation id and nil lock-set are no-ops, so callers without the feature enabled keep their old behaviour. The OpenAPI spec advertises the new 409 response. Tests cover acquire/release semantics, nil/empty no-ops, and a race-detector- friendly stress test that proves at most one holder of the same id at a time. Assisted-By: docker-agent

dgageot requested a review from a team as a code owner April 25, 2026 18:00

dgageot force-pushed the board/add-docker-agent-serve-chat-command-0b138539 branch from 40d4ebc to 8711dac Compare April 25, 2026 20:25

trungutt previously approved these changes Apr 26, 2026

View reviewed changes

dgageot marked this pull request as draft April 26, 2026 10:37

dgageot dismissed trungutt’s stale review via 594ad2b April 27, 2026 12:40

dgageot force-pushed the board/add-docker-agent-serve-chat-command-0b138539 branch from 8711dac to 594ad2b Compare April 27, 2026 12:40

dgageot added 12 commits April 27, 2026 15:46

dgageot force-pushed the board/add-docker-agent-serve-chat-command-0b138539 branch from 594ad2b to 7f16975 Compare April 27, 2026 13:48

dgageot added 7 commits April 27, 2026 15:56

dgageot force-pushed the board/add-docker-agent-serve-chat-command-0b138539 branch from 7f16975 to 0e1c5a8 Compare April 27, 2026 13:58

dgageot marked this pull request as ready for review April 27, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `docker agent serve chat` command (OpenAI-compatible API)#2510

feat: add `docker agent serve chat` command (OpenAI-compatible API)#2510
dgageot wants to merge 19 commits intodocker:mainfrom
dgageot:board/add-docker-agent-serve-chat-command-0b138539

dgageot commented Apr 25, 2026

Uh oh!

dgageot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants